Correlating Human and Automatic Evaluation of a German Surface Realiser

نویسنده

  • Aoife Cahill
چکیده

We examine correlations between native speaker judgements on automatically generated German text against automatic evaluation metrics. We look at a number of metrics from the MT and Summarisation communities and find that for a relative ranking task, most automatic metrics perform equally well and have fairly strong correlations to the human judgements. In contrast, on a naturalness judgement task, the General Text Matcher (GTM) tool correlates best overall, although in general, correlation between the human judgements and the automatic metrics was quite weak.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Realizing the Costs: Template-Based Surface Realisation in the GRAPH Approach to Referring Expression Generation

We describe a new realiser developed for the TUNA 2009 Challenge, and present its evaluation scores on the development set, showing a clear increase in performance compared to last year’s simple realiser.

متن کامل

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...

متن کامل

Automatic Detection and Localization of Surface Cracks in Continuously Cast Hot Steel Slabs Using Digital Image Analysis Techniques

Quality inspection is an indispensable part of modern industrial manufacturing. Steel as a major industry requires constant surveillance and supervision through its various stages of production. Continuous casting is a critical step in the steel manufacturing process in which molten steel is solidified into a semi-finished product called slab. Once the slab is released from the casting unit, th...

متن کامل

مقایسۀ کاربرد انواع روش‎های ارزیابی دسترس‎پذیری وب‎سایت‎ها مطالعۀ موردی: وب‎سایت وزارتخانه‌های دولت جمهوری اسلامی ایران)

Purpose: The present research aims to comparatively study different methods for evaluating the accessibility of websites and analyze the results of case study concerning websites of ministries of Iranian government, in order to indicate the strengths, weaknesses, and differences in evaluation findings by applying each of website accessibility methods. Methodology: In this paper, initially the ...

متن کامل

A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation

Abstract   Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009